Building the Feminist Dataset: Archives, Criteria, and the Politics of Curation

Every AI model is only as good — or as biased — as the data it was trained on. This is why data curation is not a technical footnote in my project. It is one of the central methodological and political acts.

Caroline Criado Perez has shown how the absence of women's data produces a world designed for men — from seatbelts calibrated to male bodies to urban planning that ignores care work. Caroline Sinders' Feminist Data Set project (2017) provides a practical framework for creating datasets that are intentionally, transparently feminist — datasets that document their own biases, gaps, and decisions.

For Spatial Configurations, data will be collected from three primary sources: the Matrix Open Feminist Architecture Archive (MOfAA), the International Archive of Women in Architecture (IAWA) at Virginia Tech, and the Women Writing Architecture collection at ETH Zürich/Graubünden. These archives contain a wealth of visual and textual material — drawings, photographs, plans, writings — by architects who have been systematically underrepresented in canonical architectural history.

But collecting data is not enough. The dataset must be annotated. Using feminist theoretical frameworks, each spatial element will be classified along a spectrum from "inclusive" to "exclusive," embedding significant semantic and conceptual depth into the AI training process. This method is grounded in what Hermans and Schlesinger (2023) call a feminist critique of technological design — ensuring that the process of building the tool is itself a reflective and inclusive practice.

The documentation follows Timnit Gebru's Datasheets for Datasets framework: every dataset must be accompanied by a document describing its motivation, composition, collection process, recommended uses, and known limitations. This is not bureaucracy — it is accountability. And for a project that argues for feminist data ethics, it is essential that the dataset itself models those ethics.

In the next entry, I will detail the specific inclusion and exclusion criteria I am developing for the dataset, and the challenges of annotating architectural images through a feminist lens.

References: Criado Perez, C. (2019). Invisible Women. London: Chatto and Windus. Sinders, C. (2017). Feminist Data Set. https://carolinesinders.com/feminist-data-set/ Hermans, F. & Schlesinger, A. (2023). A Case for Feminism in Programming Language Design. CHI '23. Gebru, T. et al. (2021). Datasheets for Datasets. Communications of the ACM. D'Ignazio, C. and Klein, L. F. (2023). Data Feminism. Cambridge: MIT Press.

Previous
Previous

Choosing the Models: Why Open Source, Why Europe, Why It Matters

Next
Next

Posthuman Architectures: Beyond the Human-Centred Design Paradigm